Subscription churn prediction

ABSTRACT

A churn prediction system includes at least one hardware processor, a memory including a historical sample set of subscriber data, and a churn prediction engine executing on the at least one hardware processor. The churn prediction engine is configured to identify the historical sample set, identify a set of attributes, automatically select a subset of attributres based on an information gain value, generate a decision tree by recursively generating nodes of the decision tree by computing an information gain value for each remaining attribute of the subset of attributes, identifying a highest attribute having the highest information gain value, and assigning the highest attribute to the node. The churn prediction engine is also configured to receive target data for a target subscriber, apply the target data to the decision tree, thereby generating a churn prediction for the target subscriber, and identify the target subscriber as a churn prediction.

CROSS REFERENCES

This application is a continuation of U.S. patent application Ser. No.14/986,476 by Vadakattu et al., entitled “Subscription ChurnPrediction,” filed on December 31, 2015; which claims the prioritybenefit of India Patent Application No. 3353/CHE/2015 by Vadakattu etal., filed on Jul. 1, 2015; each of which is incorporated herein byreference in their entirety.

TECHNICAL FIELD

Embodiments of the present disclosure relate generally to subscriptionservices in an online content system and, more particularly, but not byway of limitation, to predicting user turnover or churn in an onlinesubscription service.

BACKGROUND

Some online service providers enable their users, or “subscribers,” toenroll in a subscription-based service. For example, an onlinemarketplace may offer a subscription service to some of their customers,such as their sellers. These subscribers pay for their services for aperiod of time rather than, for example, per transaction, or perlisting, or per other unit of service. Over time, some subscribers maycancel their subscription—a phenomena commonly known as “customerchurn,” or simply “churn.”

BRIEF DESCRIPTION OF THE DRAWINGS

Various ones of the appended drawings merely illustrate exampleembodiments of the present disclosure and cannot be considered aslimiting its scope.

FIG. 1 illustrates a network diagram depicting an example churnprediction system.

FIG. 2 is a block diagram showing components provided within the churnprediction engine according to some embodiments.

FIG. 3 illustrates a computerized method, in accordance with an exampleembodiment, for predicting churn.

FIG. 4 is a block diagram illustrating a representative softwarearchitecture, which may be used in conjunction with various hardwarearchitectures herein described.

FIG. 5 is a block diagram illustrating components of a machine,according to some example embodiments, able to read instructions from amachine-readable medium (e.g., a machine-readable storage medium) andperform any one or more of the methodologies discussed herein.

The headings provided herein are merely for convenience and do notnecessarily affect the scope or meaning of the terms used. Like numbersin the Figures indicate like components.

DETAILED DESCRIPTION

The description that follows includes systems, methods, techniques,instruction sequences, and computing machine program products thatembody illustrative embodiments of the disclosure. In the followingdescription, for the purposes of explanation, numerous specific detailsare set forth in order to provide an understanding of variousembodiments of the inventive subject matter. It will be evident,however, to those skilled in the art, that embodiments of the inventivesubject matter may be practiced without these specific details. Ingeneral, well-known instruction instances, protocols, structures, andtechniques are not necessarily shown in detail.

For an online service provider offering a subscription model for theironline service, it may be a costly endeavor to acquire new subscribersto their services. For example, these costs may include advertisingcampaigns (e.g., to inform potential users of their services), addingadditional services and features (e.g., to distinguish their servicesfrom their competitors), and educational services to ease new users intouse (e.g., so they do not immediately get frustrated and abandon theservice), to name but a few. It may be a less-costly prospect tomaintain existing customers. As such, online service providers may focuson “customer retention.” Further, the online service providers maybenefit from a system that can predict which of their existingsubscription users may cancel their subscriptions in the near future, asthey may then reach out to these customer pro-actively prior tocancellation.

A churn prediction engine and associated systems and methods forpredicting subscriber churn are described herein. The churn predictionengine performs engineering methods to generate a descision tree modelfor churn prediction, sampling techniques for handling class imbalance,methods to address false alarms, and ranking algorithms for verifyingthe integrity of predictions.

As used herein, the term “subscriber churn,” or just “churn,” is usedgenerally to refer to subscribers cancelling their subscriptions to anonline service system. In other words, a level of churn refers to thelevel of turnover of subscribing users (e.g., the number of subscribersthat cancel each month). For some online service providers, low churn isdesired so as to retain as many of their existing customers as possible.

FIG. 1 illustrates a network diagram depicting an example churnprediction system 100. In the example embodiment, the churn predictionsystem 100 includes a networked system 102 that provides onlinesubscription services to online users (or “subscribers”), such as a user106 via a client device 110. The networked system 102 includes a churnprediction engine 150 for generating churn predictions for thesubscribers as described herein. In some embodiments, a third partypublication system 130 provides online subscription services to theonline users and the churn prediction engine 150 provides churnprediction services to that third party based on their subscribers'data.

The networked system 102 provides network-based, server-sidefunctionality, via a network 104 (e.g., the Internet or Wide AreaNetwork (WAN)), to the client devices 110 that may be used, for example,by sellers or buyers (not separately shown) of products and servicesoffered for sale through the publication system 142 (e.g., an onlinemarketplace system, provided by publication systems 142 or paymentsystems 144). FIG. 1 further illustrates, for example, one or both of aweb client 112 (e.g., a web browser), client application(s) 114, and aprogrammatic client 116 executing on client device 110.

Each of the client devices 110 comprises a computing device thatincludes at least a display and communication capabilities with thenetwork 104 to access the networked system 102. The client device 110includes devices such as, but not limited to, work stations, computers,general purpose computers, Internet appliances, hand-held devices,wireless devices, portable devices, wearable computers, cellular ormobile phones, portable digital assistants (PDAs), smart phones,tablets, ultrabooks, netbooks, laptops, desktops, multi-processorsystems, microprocessor-based or programmable consumer electronics, gameconsoles, set-top boxes, network PCs, mini-computers, and the like. Eachof the client devices 110 connects with the network 104 via a wired orwireless connection. For example, one or more portions of network 104may be an ad hoc network, an intranet, an extranet, a virtual privatenetwork (VPN), a local area network (LAN), a wireless LAN (WLAN), a widearea network (WAN), a wireless WAN (WWAN), a metropolitan area network(MAN), a portion of the Internet, a portion of the Public SwitchedTelephone Network (PSTN), a cellular telephone network, a wirelessnetwork, a WiFi network, a WiMax network, another type of network, or acombination of two or more such networks.

Each of the client devices 110 includes one or more applications (alsoreferred to as “apps”) 114 such as, but not limited to, a web browser,messaging application, electronic mail (email) application, ane-commerce site application (also referred to as a marketplaceapplication), and the like. In some embodiments, if the e-commerce siteapplication is included in a given one of the client devices 110, thenthis application is configured to locally provide the user interface andat least some of the functionalities with the application configured tocommunicate with the networked system 102, on an as needed basis, fordata or processing capabilities not locally available (e.g., such asaccess to a database of items available for sale, to authenticate auser, to verify a method of payment). Conversely, if the e-commerce siteapplication is not included in a given one of the client devices 110,the given one of the client devices 110 may use its web client 112 toaccess the e-commerce site (or a variant thereof) hosted on thenetworked system 102. Although only one client device 110 is shown inFIG. 1, two or more client devices 110 may be included in the churnprediction system 100.

An Application Program Interface (API) server 120 and a web server 122are coupled to, and provide programmatic and web interfaces respectivelyto, one or more application servers 140. In the example embodiment, theapplication servers 140 host the churn prediction engine 150 thatfacilitates providing prediction services, as described herein. Theapplication servers 140 are, in turn, shown to be coupled to one or moredatabase servers 124 that facilitate access to one or more databases126.

In some embodiments, the application servers 140 host one or morepublication systems 142 and payment systems 144. The publication system142, may provide a number of e-commerce functions and services to usersthat access networked system 102 and/or external sites 130. E-commercefunctions/services may include a number of publisher functions andservices (e.g., search, listing, content viewing, payment, etc.). Forexample, the publication system 142 may provide a number of services andfunctions to users for listing goods and/or services or offers for goodsor services for sale, searching for goods and services, facilitatingtransactions, and reviewing and providing feedback about transactionsand associated users. Additionally, the publication system 142 may trackand store data and metadata relating to listings, transactions, and userinteractions. In some embodiments, the publication system 142 maypublish or otherwise provide access to content items stored inapplication servers 140 or databases 126 accessible to the applicationservers 140 or the database servers 124. The payment system 144 maylikewise provide a number of payment services and functions to users.The payment system 144 may allow users to accumulate value (e.g., in acommercial currency, such as the U.S. dollar, or a proprietary currency,such as “points”) in accounts, and then later to redeem the accumulatedvalue for products or items (e.g., goods or services) that are madeavailable via the publication system 142. While the publication system142 and payment system 144 are shown in FIG. 1 to both form part of thenetworked system 102, it will be appreciated that, in alternativeembodiments, the payment system 144 may form part of a payment servicethat is separate and distinct from the networked system 102. In otherembodiments, the payment system 144 may be omitted from the churnprediction system 100. In some embodiments, at least a portion of thepublication system 142 may be provided on the client devices 110.

Further, while the churn prediction system 100 shown in FIG. 1 employs aclient-server architecture, embodiments of the present disclosure arenot limited to such an architecture, and may equally well findapplication in, for example, a distributed or peer-to-peer architecturesystem. The various publication and payment systems 142 and 144 may alsobe implemented as standalone software programs, which do not necessarilyhave networking capabilities.

The client devices 110 access the various publication and paymentsystems 142 and 144 via the web interface supported by the web server122. Similarly, the programmatic client 116 accesses the variousservices and functions provided by the publication and payment systems142 and 144 via the programmatic interface provided by the API server120. The programmatic client 116 may, for example, be a sellerapplication (e.g., the TurboLister application developed by eBay Inc.,of San Jose, Calif.) to enable sellers to author and manage listings onthe networked system 102 in an off-line manner, and to performbatch-mode communications between the programmatic client 116 and thenetworked system 102.

In the example embodiment, the networked system 102 provides one or moresubscription-based services to users 106. The users 106 may be sellersin an online marketplace (e.g., the publication system 142 and thepayment system 144), receiving subscription services associated with theselling of products and services through the online marketplace to otherusers. During the course of operation, the networked system 102 collectsvarious histocial data associated with the activities of the seller 106(e.g., listings information, completed sales information, activityinformation, and so forth). The networked system 102 also collectsinformation about when some of those sellers 106 subsequently cancelledtheir subscriptions, or churned. The churn prediction engine 150analyzes this historical data across many sellers 106 to build aprediction model for predicting when sellers are likely to cancel. Theprediction model may then be used with later sellers 106 (e.g., beforethey cancel their subscriptions) to identify those that may be “at risk”of cancelling.

FIG. 2 is a block diagram showing components provided within the churnprediction engine 150 according to some embodiments. The churnprediction engine 150 may be hosted on dedicated or shared servermachines (not shown) that are communicatively coupled to enablecommunications between server machines. The components themselves arecommunicatively coupled (e.g., via appropriate interfaces) to each otherand to various data sources, so as to allow information to be passedbetween the applications or so as to allow the applications to share andaccess common data. Furthermore, the components may access one or moredatabases 126 via the database servers 124 (both shown in FIG. 1).

The churn prediction engine 150 provides a number of churn predictionfeatures whereby the churn prediction engine 150 analyzes user dataassociated with subscription users, generates a prediction graph, or“churn model,” and applies additional features to improve performance ofthe churn model.

To this end, the example churn prediction engine 150 includes anattribute collection module 210, a feature engineering module 220, amodel training module 230, a precision improvement module 240, a churnprediction module 250, and a prediction ranking module 260.

In analyzing and predicting churn for subscription users, some of theexamples described herein analyze sellers in an online marketplace. Forease of understanding, presume the sellers are “monthly” subscribers(e.g., their enrollment is ongoing, but may be paid monthly, or they maycancel at any time, and only may be out a single month's subscriptionfee). As such, the subscribers may be divided into two classes eachmonth: those that cancel their subscriptions, or “churn,” during thatmonth (referred to herein as “churns”) and those that continue tomaintain their active subscriptions (referred to herein as “survivors”).

In the example embodiment, the attribute collection module 210identifies user data associated with the subscription users 106occurring in an online environment, such as the online services offeredby the networked system 102. In some embodiments, the online environmentis an online marketplace such as the publication system 142 or a thirdparty marketplace supported by an external site 130.

Online sellers 106 generate user data over time and through use of thenetworked system 102 such as transactional data (e.g., data about theirpublished listings, sales, revenue, inventory, and so forth) and siteactivity (e.g., web pages visited, number of logins, length ofsubscription, and so forth). This user data may be referred to herein,and collectively, as “attribute data” or “feature data” of thesubscription seller(s) 106. In some contexts, this attribute data may be“historical data” associated with a set of sellers that is used formodel training (e.g., the input data used to train a prediction model).In other contexts, this attribute data may be “target user data”associated with a particular seller for which a churn prediction iscomputed (e.g., an application of the prediction model to the particularseller).

The attribute collection module 210 collects the user data of thesubscription users (e.g., sellers), both for model training purposes andfor model application purposes. In some embodiments, the attributecollection module 210 may retrieve data from the databases 126 of thenetworked system 102 (e.g., data as generated by the publication systems142 and payment systems 144). In other embodiments, the attributecollection module 210 may retrieve or receive data from the third partypublication system 130, or the client devices 110.The attributecollection module 210 provides this attribute data to the featureengineering module 220.

The feature engineering module 220 prepares the historical data prior touse in training the prediction model. The historical data processed bythe feature engineering module 220 may be grouped into several classesof attributes (e.g., attributes of subscription sellers, in thisexample): (1) customer demographic information (e.g., seller's countryof residence, seller's segment, and so forth); (2) subscription details(e.g., subscription service level, subscription contract period, and soforth); (3) event data (e.g., listings data, gross merchandise value(GMV), activity data such as quantity sold, revenue data, and so forth);(4) domain specific data (e.g., seller feedback rating, number ofreturns, number of repeat customers, and so forth); and (5) behavioraldata (e.g., number of page visits, number of logins, and so forth). Insome embodiments, the attributes may include one or more of thefollowing:

FullTermgmvrevenue

Midtermgmvrevenue

Mostrecentgmvdip

Adaptivegmvavgdip

Midtermgmvavgdip

Numgmvzeroinfullterm (# times a seller gmv is zero in last year)

Alltimegmvlow

Gmvstandaradeviation

GMVyearonnyearchange

Lastgmvnormed

Fulltermactivelistingsrevenue

Midtermactivelistingsrevenue

Midfull activelistingsratio

Mostrecent activelistingsdip

Adaptive activelistingsavgdip

Midterm activelistingsavgdip

Numactivelistingszeroinfullterm

Alltimeactivelistingslow

Activelistingsstandardtdeviation

Activelistingsyearonyearchange

Lastactivelistingsnormed

SellerStandardLevel

Sellersegment

Monthlyoryearly Subscription

StoreLevel

Category

The feature engineering module 220 evaluates the historical data toidentify a set of attributes (e.g., portions or fields of the historicaldata) to use in training the prediction model. In some embodiments, thefeature engineering module 220 implements Waikato Environment forKnowledge Analysis (WEKA) machine learning software and, moreparticularly, the “Attribute Evaluator” library, to perform attributeselection and ranking. The feature engineering module 220 computes aninformation gain value (IGV) for each of the attributes, then selects aset of attributes based, at least in part, on the information gainvalues. For example, Table 1 shows the top four attributes of an examplesubset of attributes:

TABLE 1 Information Gain Values Attribute Name Information Gain Value(IGV) Normalized active listings of seller 0.16576 for last month Dip inthe active listings in the most 0.13476 recent month Normalized GMV0.13177 Seller Store Age 0.02138As such, and for example, the feature engineering module 220 may selectthe four attributes listed in Table 1 as training attributes fortraining the prediction model. The feature engineering module 220 mayselect a pre-defined number of attributes based on IGV (e.g., the top nattributes), or may select all attributes having an IGV value above apre-determined threshold (e.g., all attributes having IGV>x).

The feature engineering module 220 may construct or compute one or moreattributes from the subscriber data. These custom attributes arereferred to herein as “fabricated attributes.” The fabricated attributesmay, for example, be computed based on one or more other attributes(e.g., “past four months, sales, moving average”, or “number ofconsecutive months of $0 in sales”). These computations may be made inthe application RAM in phyton, and partly in R. Once computed, thesefabricated attributes may also be candidates for selection.

In some embodiments, one or more of these fabricated attributes, or anyof the other attributes, may be pre-identified (e.g., by an analyst) forinclusion in the selected set of attributes. In other words, some of theattributes may be used regardless of their IGV. For example, an analystmay engage some past (e.g., churned) subscribers directly, or throughonline inputs, to determine why they cancelled their subscriptions. Theanalyst may generate some insight into a new factor for churn that maybe influential in predicting churn. As such, the analyst may generate afabricated attribute, or identify an already-existing attribute, and mayforce that attribute to be used in building the prediction model.

In some embodiments, the feature engineering module 220 calculates abaseline accuracy. With some subscription services, the rate ofcancellation may be an uncommon or rare event. As such, the baselineaccuracy may indicate a class imbalance within the training data set(e.g., many more survivors than churns). Analysis under such situationsmay lead to a phenomena of “base rate fallacy.” This result may affecthow the training data is selected from all of the historical data.

In some embodiments, the training data set may be selected as a subsetof random data points from the historical data. In the exampleembodiment (e.g., in situations in which the training data exhibitsclass imbalance), the feature engineering module 220 may equisample bothchurn and survivor classes to form the training data set (e.g.,undersampling the majority class of survivors, and oversampling theminority class of churns). In other embodiments, the training data setmay be selected such as to artificially alter the minority class inrelation to the majority class, such as, for example, through SyntheticMinority Over-Sampling Technique (SMOTE) or Borderline SMOTE.

Further, the feature engineering module 220 selects the training dataset such as to approximate the distribution of the feature space of theactual population. More specifically, the historical data points areclustered into K groups (e.g., stratified clustering). Then, from eachgroup, a number of data points are selected, keeping a portion ofsampling from each group to be the same. As such, more sampling isenabled from denser regions and less from sparse regions, anddistribution of the data is maintained in the sample set (e.g., ascompared to the entire historical set of data). This method may beperformed as a hybrid form of stratified subsampling where the strataare the clusters.

Once the training set has been identified by the feature engineeringmodule 220, the model training module 230 trains the prediction modelusing the selected attributes (“training attributes”) and training dataset identified by the feature engineering module 220. In the exampleembodiment, the model training module 230 builds a decision tree basedon information entropy. In other words, the prediction model becomes adecision tree, once built. As such, the terms “decision tree” and“prediction model” may be used interchangeably herein.

In the example embodiment, the model training module uses the C4.5algorithm to build the decision tree. Each non-leaf node in theresulting decision tree represents a decision point based on one of thetraining attributes. Each non-leaf node, or decision point, identifiesan attribute associated with the node, as well as one or more thresholdlevels or discrete values associated with the node's assigned attribute.The non-leaf nodes have one or more child nodes attached to the non-leafnode (e.g., one associated with being over a threshold, the otherassociated with being under the threshold). Each leaf node has nochildren and, as such, represents a final classification value (e.g.,either “churn” or “survivor”).

To build the decision tree, starting from the root node, and recursivelyfor each subsequent child node created, the model training module 230computes information gain values (IGVs) for each of the remainingtraining attributes (e.g., those attributes not yet addressed as adecision point by a node in the present node's direct lineage back tothe root node), and over the training data set. The remaining trainingattribute having the highest IGV is selected and assigned as theattribute associated with the present node.

Further, a threshold value is computed for the present node. The node'sthreshold value, combined with the particular attribute of the node,defines the rule for which direction or branch of the decision tree istaken during a prediction analysis for a target subscriber. In otherwords, and for example, if the target subscriber's attribute is abovethe threshold value, the analysis would branch to a first child node,else the analysis would branch to a second child node. The modeltraining module 230 computes the threshold value based on the C4.5algorithm (e.g., based on the IGV).

If the present node is determined to have one or more children (i.e., tobe a non-leaf node), then the model training module 230 creates each ofthe child nodes and recurses for each (e.g., evaluating each child nodeas described above). If the present node is determined to be a leafnode, as described above, then no deeper recursion is performed on thepresent node, and the present node recursion terminates and returns.This determination is made as a part of the C4.5 algorithm, but is basedon a lower bound value (e.g., a “minimum bucket size”) provided to thealgorithm that determines when to terminate the recursion (e.g., to stopsplitting the current bucket into another lower level). As such,overfitting may be controlled by the model training module 230 (e.g., toavoid too large or too deep of a tree). In some embodiments, the modeltraining module 230 also disables pruning in the C4.5 algorithm.

Further, the model training module 230 also introduces a bias byincreasing the prediction threshold for predicting a bucket (e.g., leafnode) as churn to higher than 50%. For example, the model trainingmodule 230 may increase the threshold to 70% or 80%. In other words, aleaf node with less than, for example, 70% of the remaining samplesbeing churns would result in that bucket being labelled as “survivor”rather than “churn.” This bias may improve on the number of false alarmsgenerated by the prediction model.

The churn prediction engine 150 also includes a precision improvementmodule 240 that focuses on improving the performance of the predictionmodels for subscribers. The prediction models described herein maygenerate a number of “false alarms,” based on training data or duringapplication of the model to the target subscribers. The term “falsealarm” is used to refer to the situation when the prediction modelgenerates a prediction (e.g., classification) that a particular seller(historical or current) is in the churn class when the seller isactually a survivor (e.g., for that particular month). The term“precision” is used herein to refer to how reliably those classified aschurns actually churn. In other words, better precision yields a lowernumber of false alarms. Further, the term “recall” is used herein torefer to how reliably the actual churns were properly identified in thechurn class. In some situations, precision may be improved, but at theexpense of recall, a phenomena referred to herein as “theprecision/recall tradeoff.”

In one embodiment, the precision improvement module 240 performs datasegmentation activities to, for example, help improve the performance ofthe churn prediction engine 150. The precision improvement module 240may alter the functioning of the feature engineering module 220 inselecting the training data. More specifically, the precisionimprovement module 240 may segment the training data into multiple“segmented training sets” based on certain features, attributes, orcategories of subscribers that exhibit different responses than others.These segmentations may be performed on certain criteria that exhibitclusters of similarly-behaving subscribers.

For example, in one embodiment, the precision improvement module 240segments training subscribers based on the length of their accounthistory (e.g., account age, or subscription age). Sellers who have beenlong-time subscribers may behave differently than medium- or short-termsubscribers (e.g., those with relatively-newer accounts). As such, theprecision improvement module 240 may segment the training data into twoor more training sets, such as, for example, a long-term set (e.g.,account age>=1 year) and a short-term set (e.g., account age<1 years),or a long-term set (e.g., account age>=2 years), a medium-term set(e.g., account age>=6 months and<2 years), and a short-term set (e.g.,account age<6 months).

In other embodiments, the precision improvement module 240 segmentstraining data based on Gross Merchandise Value (GMV). For example, theprecision improvement module 240 may segment sellers having a high tierGMV (e.g., GMV>=$150) and a low tier GMV (e.g., GMV<$150). Further, insome embodiments, the precision improvement module 240 segments onmultiple attributes, such as a combination of account age and GMV.

After data segmentation, the model training module 230 generatesmultiple prediction models, one for each segmented training set. Assuch, each prediction model caters to (e.g., more precisely predicts) acertain segment of subscribers.

Once the prediction model(s) are generated by the model training module230, as optionally coordinated by the precision improvement module 240,the churn prediction module 250 applies the prediction model(s) to oneor more “target subscribers” (e.g., subscribers whose futuresubscription data is not yet known). In other words, the churnprediction module 250 applies recent target user data of the targetsubscriber to at least one of the generated prediction models togenerate a churn prediction for the target subscriber (e.g.,categorizing the target subscriber as as either a churn or a survivor).

In the example embodiment, the churn prediction module 250 receivesrecent attribute data for the target subscriber. More specifically,because the prediction models operate based on a subset of attributes(e.g., the training attributes determined by the feature engineeringmodule 220), the churn prediction module 250 receives recent data forthe target subscriber for at least that particular subset of attributes.During application of the prediction model (e.g., the decision tree),the churn prediction module 250 traverses down the tree, starting fromthe root node. At each level of the descent, if the node is a non-leafnode, the churn prediction module 250 determines what attribute isassociated with that node, computes or determines a target value forthat attribute based on the target attribute data, compares the targetvalue to the threshold value of the node, and makes a branching decisionto one child or another. After the decision is made, the branch istaken, and the next node is investigated in similar fashion. Thisdescent through the tree is taken until a leaf node is reached. Thetarget subscriber is then categorized based on the leaf node at whichthe process has arrived, thereby generating a churn prediction for thetarget subscriber.

The churn prediction module 250 may also track the path through thedecision tree, thereby identifying the results of each decision point ateach node. The churn prediction module 250 may provide this data to ananalyst (e.g., through a graphical user interface), thereby enabling theanalyst to study the results of each step and possibly pinpointinfluential data as to why the target user is likely to churn. Further,the churn prediction module 250 may also track the resultant targetvalues for the target subscriber at each node. This data may provide aninsight to analysts as to how close the target subscriber was to aparticular threshold (e.g., barely on one side of the threshold) or howweighted the target subscriber was with regard to that attribute (e.g.,heavily to one side of the threshold).

In some embodiments, the precision improvement model 240 also tracksrepetitive false alarms generated by the prediction model(s) (e.g.,during application of the prediction model(s) to target subscribers). Ifa particular subscriber is miscategorized as a churn for apre-determined number of consecutive time periods (e.g., 3 months or 6months in a row), or a pre-determined number of time periods within afixed window (e.g., 4 months of the most recent 6 month period), thenthe subscriber may be considered resistive to churn, and the precisionimprovement model 240 may relabel that subscriber as a survivor. Assuch, during later evaluations, this “resistive subscriber” may becategorized as a survivor regardless of the churn prediction model.

Analysts may study the results generated for the target subscriber bythe prediction model(s) and, for those predicted as churns, the analystsmay approach the subscribers and attempt to mitigate some of the factorscausing the subscriber to be likely to churn.

Further, the precision improvement module may apply the predictionmodel(s) to many subscribers. Depending on the number of subscribersanalyzed, this may generate a set of subscribers likely to churn, butthat set may be too large to feasibly address (e.g., due to constraintssuch as logistics, costs, or time constraints). As such, the predictionranking module 260 ranks the target subscribers to generate a rankedlist from which the analyst may work.

More specifically, in one embodiment, the algorithm for ranking allcustomers based on an attribute or feature attr (e.g., GMV). Further,step size is a user defined value which indicates the range of attrbetween which the subscribers are treat almost the same (e.g., $10).Initial offset is the smallest value of this attribute from which westart counting the step size. Presume the subscriber data is stored in adataframe temp. The below example code is in the statistics programminglanguage “R”, and tags every customer with a rank based upon itsattribute value. This algorithm adopts the paradigm of divide andconquer similar to merge sort:

v <− seq(initial_offset,max(attr),step_size) temp <− cbind(temp,bin =findInterval(temp,v)) rank <− unique(temp$bin) rank <− sort(rank)matchfunc <− function(x)match(x,rank) temp <−cbind(temp,rank=sapply(temp$bin,matchfunc)) temp <− temp[order(rank)]temp <− mergerank(temp,highest_(rank), lowest_(rank), attr) {  width =max(df[,get(attr)]) − min(df[,get(attr)])  width =width/ranks    #ranks[division by zero!]  if (width <= 0 )return(df) stat = ggplot2:::(df[,get(attr)],binwidth=width)  maxbin =max(stat$count)  lwrrank = (ranks/2)  if (maxbin >MAX_CUSTOMERS_PER_RANK && lwrrank >=  1 && (ranks > lwrrank) &&width > 1) {   lwrmedian = (nrow(df)/2)   lwrhalf = [1:lwrmedian,]  uprhalf = [(lwrmedian+1):(df),]   lwrhalf <−mergerank(lwrhalf,lwrrank,offset,attr)   uprhalf <−mergerank(uprhalf,ranks−lwrrank,offset + lwrrank,attr)   df <−rbind(lwrhalf,uprhalf)  } else {   vrange <−seq(min(df[,get(attr)]),max(df[,get(attr)]),width)   scaled_rank <−   findInterval(df[,get(attr)],vrange,rightmost.closed =   TRUE,all.inside = TRUE)   scaled_rank <− scaled_rank + offset   df <−cbind(df,srank = scaled_rank)   return(df)  } }

Considering computational performance, the churn prediction engine 150may employ any of a number of enhancements to improve the efficiency andprocessing of the systems and methods described herein. For example,Weka may be used for modeling, and R may be used for exploratoryanalysis. For another example, historical data for subscribers and theirassociated information may be stored in a variety of databases that maynot be optimized for these operations, or the data may reside primarilyin production databases that may not be able to support or accommodatethe operations described herein. As such, the historical data and targetdata may be stored in a secondary database, or in a data warehouse(e.g., where data is synced from a production database on a periodicbasis). Further, an attribute aggregator or pre-processor may query alarge set of data to, for example, generate a subset of attribute datafor a set of subscribers (e.g., only 30 pre-selected attributes), or asubset of subscribers' data (e.g., only newer subscribers), or aparticular time period of subscribers' data (e.g., the last 1 year ofdata). As such, this pre-processing may improve the performance of thechurn prediction engine 150 by, for example, clearing away extraneousdata, or data that is not used by these systems and methods.

FIG. 3 illustrates a computerized method 600, in accordance with anexample embodiment, for predicting churn. The computerized method 600 isperformed by a computing device comprising at least one processor and amemory. In the example embodiment, the computerized method 600 includesidentifying a historical sample set of subscriber data in a memory atoperation 610. The method 600 also includes identifying a set ofattributes within the historical sample set at operation 620. The method600 further includes automatically selecting a subset of attributresfrom the set of attributes based on an information gain value of eachattribute of the set of attributes at operation 630. In someembodiments, automatically selecting the subset of attributres atoperation 630 further includes selecting attributes having informationgain value above a pre-determined threshold.

At operation 640, the method 600 includes generating a decision treebased on the selected subset of attributes and the historical sampleset. Generating the decision tree (e.g., operation 640) further includesrecursively generating nodes of the decision tree starting from a rootnode. Each non-leaf node of the decision tree representing an attributefrom the subset of attributes. Generating a first non-leaf node of thedecision tree includes computing an information gain value for eachremaining attribute of the subset of attributes at operation 642,identifying a highest attribute having the highest information gainvalue at operation 644, and assigning the highest attribute to the firstnon-leaf node at operation 646.

At operation 650, the method 600 includes receiving target data for atarget subscriber, the target data including each attribute in thesubset of attributes. At operation 660, the method 600 includes applyingthe target data to the decision tree, thereby generating a churnprediction for the target subscriber. At operation 670, the methodincludes identifying the target subscriber as a churn prediction.

In some embodiments, the method 600 includes computing the informationgain value of each attribute of the set of attributes, whereinautomatically selecting the subset of attributres further includesselecting a pre-determined number of attributes having the highestinformation gain value. In some embodiments, the method 600 includesreceiving indication of an analyst-identified attribute, and adding theanalyst-identified attribute to the selected subset of attributes forinclusion generating the decision tree. In some embodiments, identifyingthe historical sample set at operation 610 further includes selectingthe historical sample set from a pool of historical samples based on adistrubtion of a feature space of the pool of historical samples,wherein selecting the historical sample set further includes clusteringthe historical samples into K clusters, and performing stratifiedsubsampling of the historical samples using the K clusters as strata.

In some embodiments, the decision tree includes a leaf node, andgenerating the decision tree at operation 640 further includes biasingthe leaf node as a survivor if a percentage of remaining samples at theleaf node labeled as churns are above a pre-determined threshold, thepre-determined threshold is higher than 50%. In some embodiments,identifying the historical sample set at operation 610 further includesselecting the historical sample set from a pool of historical samples,the selecting including segmenting the historical sample set ofsubscriber data based on one or more of account age, subscription age,and gross merchandise value (GMV).

Certain embodiments are described herein as including logic or a numberof components, modules, or mechanisms. Modules may constitute eithersoftware modules (e.g., code embodied on a machine-readable medium) orhardware modules. A “hardware module” is a tangible unit capable ofperforming certain operations and may be configured or arranged in acertain physical manner. In various example embodiments, one or morecomputer systems (e.g., a standalone computer system, a client computersystem, or a server computer system) or one or more hardware modules ofa computer system (e.g., a processor or a group of processors) may beconfigured by software (e.g., an application or application portion) asa hardware module that operates to perform certain operations asdescribed herein.

In some embodiments, a hardware module may be implemented mechanically,electronically, or any suitable combination thereof. For example, ahardware module may include dedicated circuitry or logic that ispermanently configured to perform certain operations. For example, ahardware module may be a special-purpose processor, such as aField-Programmable Gate Array (FPGA) or an Application SpecificIntegrated Circuit (ASIC). A hardware module may also includeprogrammable logic or circuitry that is temporarily configured bysoftware to perform certain operations. For example, a hardware modulemay include software executed by a general-purpose processor or otherprogrammable processor. Once configured by such software, hardwaremodules become specific machines (or specific components of a machine)uniquely tailored to perform the configured functions and are no longergeneral-purpose processors. It will be appreciated that the decision toimplement a hardware module mechanically, in dedicated and permanentlyconfigured circuitry, or in temporarily configured circuitry (e.g.,configured by software) may be driven by cost and time considerations.

Accordingly, the phrase “hardware module” should be understood toencompass a tangible entity, be that an entity that is physicallyconstructed, permanently configured (e.g., hardwired), or temporarilyconfigured (e.g., programmed) to operate in a certain manner or toperform certain operations described herein. As used herein,“hardware-implemented module” refers to a hardware module. Consideringembodiments in which hardware modules are temporarily configured (e.g.,programmed), each of the hardware modules need not be configured orinstantiated at any one instance in time. For example, where a hardwaremodule comprises a general-purpose processor configured by software tobecome a special-purpose processor, the general-purpose processor may beconfigured as respectively different special-purpose processors (e.g.,comprising different hardware modules) at different times. Softwareaccordingly configures a particular processor or processors, forexample, to constitute a particular hardware module at one instance oftime and to constitute a different hardware module at a differentinstance of time.

Hardware modules can provide information to, and receive informationfrom, other hardware modules. Accordingly, the described hardwaremodules may be regarded as being communicatively coupled. Where multiplehardware modules exist contemporaneously, communications may be achievedthrough signal transmission (e.g., over appropriate circuits and buses)between or among two or more of the hardware modules. In embodiments inwhich multiple hardware modules are configured or instantiated atdifferent times, communications between such hardware modules may beachieved, for example, through the storage and retrieval of informationin memory structures to which the multiple hardware modules have access.For example, one hardware module may perform an operation and store theoutput of that operation in a memory device to which it iscommunicatively coupled. A further hardware module may then, at a latertime, access the memory device to retrieve and process the storedoutput. Hardware modules may also initiate communications with input oroutput devices, and can operate on a resource (e.g., a collection ofinformation).

The various operations of example methods described herein may beperformed, at least partially, by one or more processors that aretemporarily configured (e.g., by software) or permanently configured toperform the relevant operations. Whether temporarily or permanentlyconfigured, such processors may constitute processor-implemented modulesthat operate to perform one or more operations or functions describedherein. As used herein, “processor-implemented module” refers to ahardware module implemented using one or more processors.

Similarly, the methods described herein may be at least partiallyprocessor-implemented, with a particular processor or processors beingan example of hardware. For example, at least some of the operations ofa method may be performed by one or more processors orprocessor-implemented modules. Moreover, the one or more processors mayalso operate to support performance of the relevant operations in a“cloud computing” environment or as a “software as a service” (SaaS).For example, at least some of the operations may be performed by a groupof computers (as examples of machines including processors), with theseoperations being accessible via a network (e.g., the Internet) and viaone or more appropriate interfaces (e.g., an Application ProgramInterface (API)).

The performance of certain of the operations may be distributed amongthe processors, not only residing within a single machine, but deployedacross a number of machines. In some example embodiments, the processorsor processor-implemented modules may be located in a single geographiclocation (e.g., within a home environment, an office environment, or aserver farm). In other example embodiments, the processors orprocessor-implemented modules may be distributed across a number ofgeographic locations.

The modules, methods, applications and so forth described in conjunctionwith FIGS. 1 and 2 are implemented in some embodiments in the context ofa machine and an associated software architecture. The sections belowdescribe representative software architecture(s) and machine (e.g.,hardware) architecture that are suitable for use with the disclosedembodiments.

Software architectures are used in conjunction with hardwarearchitectures to create devices and machines tailored to particularpurposes. For example, a particular hardware architecture coupled with aparticular software architecture will create a mobile device, such as amobile phone, tablet device, or so forth. A slightly different hardwareand software architecture may yield a smart device for use in the“internet of things.” While yet another combination produces a servercomputer for use within a cloud computing architecture. Not allcombinations of such software and hardware architectures are presentedhere as those of skill in the art can readily understand how toimplement the invention in different contexts from the disclosurecontained herein.

FIG. 4 is a block diagram 400 illustrating a representative softwarearchitecture 402, which may be used in conjunction with various hardwarearchitectures herein described. FIG. 4 is merely a non-limiting exampleof a software architecture and it will be appreciated that many otherarchitectures may be implemented to facilitate the functionalitydescribed herein. The software architecture 402 may be executing onhardware such as machine 500 of FIG. 5 that includes, among otherthings, processors 510, memory 530, and I/O components 550. Arepresentative hardware layer 404 is illustrated and can represent, forexample, the machine 500 of FIG. 5. The representative hardware layer404 comprises one or more processing units 406 having associatedexecutable instructions 408. Executable instructions 408 represent theexecutable instructions of the software architecture 402, includingimplementation of the methods, modules and so forth of FIGS. 1 and 2.Hardware layer 404 also includes memory or storage modules 410, whichalso have executable instructions 408. Hardware layer 404 may alsocomprise other hardware as indicated by 412 which represents any otherhardware of the hardware layer 404, such as the other hardwareillustrated as part of machine 500.

In the example architecture of FIG. 4, the software 402 may beconceptualized as a stack of layers where each layer provides particularfunctionality. For example, the software 402 may include layers such asan operating system 414, libraries 416, frameworks/middleware 418,applications 420 and presentation layer 422. Operationally, theapplications 420 or other components within the layers may invokeapplication programming interface (API) calls 424 through the softwarestack and receive a response, returned values, and so forth illustratedas messages 426 in response to the API calls 424. The layers illustratedare representative in nature and not all software architectures have alllayers. For example, some mobile or special purpose operating systemsmay not provide a frameworks/middleware layer 418, while others mayprovide such a layer. Other software architectures may includeadditional or different layers.

The operating system 414 may manage hardware resources and providecommon services. The operating system 414 may include, for example, akernel 428, services 430, and drivers 432. The kernel 428 may act as anabstraction layer between the hardware and the other software layers.For example, the kernel 428 may be responsible for memory management,processor management (e.g., scheduling), component management,networking, security settings, and so on. The services 430 may provideother common services for the other software layers. The drivers 432 maybe responsible for controlling or interfacing with the underlyinghardware. For instance, the drivers 432 may include display drivers,camera drivers, Bluetooth® drivers, flash memory drivers, serialcommunication drivers (e.g., Universal Serial Bus (USB) drivers), Wi-Fi®drivers, audio drivers, power management drivers, and so forth dependingon the hardware configuration.

The libraries 416 may provide a common infrastructure that may beutilized by the applications 420 or other components or layers. Thelibraries 416 typically provide functionality that allows other softwaremodules to perform tasks in an easier fashion than to interface directlywith the underlying operating system 414 functionality (e.g., kernel428, services 430 or drivers 432). The libraries 416 may include system434 libraries (e.g., C standard library) that may provide functions suchas memory allocation functions, string manipulation functions,mathematic functions, and the like. In addition, the libraries 416 mayinclude API libraries 436 such as media libraries (e.g., libraries tosupport presentation and manipulation of various media format such asMPREG4, H.264, MP3, AAC, AMR, JPG, PNG), graphics libraries (e.g., anOpenGL framework that may be used to render 2D and 3D in a graphiccontent on a display), database libraries (e.g., SQLite that may providevarious relational database functions), web libraries (e.g., WebKit thatmay provide web browsing functionality), and the like. The libraries 416may also include a wide variety of other libraries 438 to provide manyother APIs to the applications 420 and other softwarecomponents/modules.

The frameworks 418 (also sometimes referred to as middleware) mayprovide a higher-level common infrastructure that may be utilized by theapplications 420 or other software components/modules. For example, theframeworks 418 may provide various graphic user interface (GUI)functions, high-level resource management, high-level location services,and so forth. The frameworks 418 may provide a broad spectrum of otherAPIs that may be utilized by the applications 420 or other softwarecomponents/modules, some of which may be specific to a particularoperating system or platform.

The applications 420 includes built-in applications 440 or third partyapplications 442. Examples of representative built-in applications 440may include, but are not limited to, a contacts application, a browserapplication, a book reader application, a location application, a mediaapplication, a messaging application, or a game application. Third partyapplications 442 may include any of the built in applications as well asa broad assortment of other applications. In a specific example, thethird party application 442 (e.g., an application developed using theAndroid™ or iOS™ software development kit (SDK) by an entity other thanthe vendor of the particular platform) may be mobile software running ona mobile operating system such as iOS™, Android™, Windows® Phone, orother mobile operating systems. In this example, the third partyapplication 442 may invoke the API calls 424 provided by the mobileoperating system such as operating system 414 to facilitatefunctionality described herein.

The applications 420 may utilize built in operating system functions(e.g., kernel 428, services 430 or drivers 432), libraries (e.g., system434, APIs 436, and other libraries 438), frameworks/middleware 418 tocreate user interfaces to interact with users of the system.Alternatively, or additionally, in some systems interactions with a usermay occur through a presentation layer, such as presentation layer 444.In these systems, the application/module “logic” can be separated fromthe aspects of the application/module that interact with a user.

Some software architectures utilize virtual machines. In the example ofFIG. 4, this is illustrated by virtual machine 448. A virtual machinecreates a software environment where applications/modules can execute asif they were executing on a hardware machine (such as the machine ofFIG. 5, for example). A virtual machine is hosted by a host operatingsystem (operating system 414 in FIG. 4) and typically, although notalways, has a virtual machine monitor 446, which manages the operationof the virtual machine as well as the interface with the host operatingsystem (i.e., operating system 414). A software architecture executeswithin the virtual machine such as an operating system 450, libraries452, frameworks/middleware 454, applications 456 or presentation layer458. These layers of software architecture executing within the virtualmachine 448 can be the same as corresponding layers previously describedor may be different.

In the example embodiment, the churn prediction engine 150 operates asan application in the applications 420 layer. However, in someembodiments, the churn prediction engine 150 may operate in othersoftware layers, or in multiple software layers (e.g., framework 418 andapplication 420), or in any architecture that enables the systems andmethods as described herein.

FIG. 5 is a block diagram illustrating components of a machine 500,according to some example embodiments, able to read instructions from amachine-readable medium (e.g., a machine-readable storage medium 538)and perform any one or more of the methodologies discussed herein.Specifically, FIG. 5 shows a diagrammatic representation of the machine500 in the example form of a computer system, within which instructions516 (e.g., software, a program, an application, an applet, an app, orother executable code) for causing the machine 500 to perform any one ormore of the methodologies discussed herein may be executed. For examplethe instructions may cause the machine to execute the user interactionsmodule 210, external site communications module 220, graphing module230, community analysis module 240, and bidding module 250, and soforth. The instructions transform the general, non-programmed machineinto a particular machine programmed to carry out the described andillustrated functions in the manner described. In alternativeembodiments, the machine 500 operates as a standalone device or may becoupled (e.g., networked) to other machines. In a networked deployment,the machine 500 may operate in the capacity of a server machine or aclient machine in a server-client network environment, or as a peermachine in a peer-to-peer (or distributed) network environment. Themachine 500 may comprise, but not be limited to, a server computer, aclient computer, a personal computer (PC), a tablet computer, a laptopcomputer, a netbook, a set-top box (STB), a personal digital assistant(PDA), an entertainment media system, a cellular telephone, a smartphone, a mobile device, a wearable device (e.g., a smart watch), a smarthome device (e.g., a smart appliance), other smart devices, a webappliance, a network router, a network switch, a network bridge, or anymachine capable of executing the instructions 516, sequentially orotherwise, that specify actions to be taken by machine 500. Further,while only a single machine 500 is illustrated, the term “machine” shallalso be taken to include a collection of machines 500 that individuallyor jointly execute the instructions 516 to perform any one or more ofthe methodologies discussed herein.

The machine 500 may include processors 510, memory 530, and I/Ocomponents 550, which may be configured to communicate with each othersuch as via a bus 502. In an example embodiment, the processors 510(e.g., a Central Processing Unit (CPU), a Reduced Instruction SetComputing (RISC) processor, a Complex Instruction Set Computing (CISC)processor, a Graphics Processing Unit (GPU), a Digital Signal Processor(DSP), an Application Specific Integrated Circuit (ASIC), aRadio-Frequency Integrated Circuit (RFIC), another processor, or anysuitable combination thereof) may include, for example, processor 512and processor 514 that may execute instructions 516. The term“processor” is intended to include multi-core processor that maycomprise two or more independent processors (sometimes referred to as“cores”) that may execute instructions contemporaneously. Although FIG.5 shows multiple processors, the machine 500 may include a singleprocessor with a single core, a single processor with multiple cores(e.g., a multi-core process), multiple processors with a single core,multiple processors with multiples cores, or any combination thereof.

The memory/storage 530 may include a memory 532, such as a main memory,or other memory storage, and a storage unit 536, both accessible to theprocessors 510 such as via the bus 502. The storage unit 536 and memory532 store the instructions 516 embodying any one or more of themethodologies or functions described herein. The instructions 516 mayalso reside, completely or partially, within the memory 532, within thestorage unit 536, within at least one of the processors 510 (e.g.,within the processor's cache memory), or any suitable combinationthereof, during execution thereof by the machine 500. Accordingly, thememory 532, the storage unit 536, and the memory of processors 510 areexamples of machine-readable media.

As used herein, “machine-readable medium” means a device able to storeinstructions and data temporarily or permanently and may include, but isnot be limited to, random-access memory (RAM), read-only memory (ROM),buffer memory, flash memory, optical media, magnetic media, cachememory, other types of storage (e.g., Erasable Programmable Read-OnlyMemory (EEPROM)) or any suitable combination thereof. The term“machine-readable medium” should be taken to include a single medium ormultiple media (e.g., a centralized or distributed database, orassociated caches and servers) able to store instructions 516. The term“machine-readable medium” shall also be taken to include any medium, orcombination of multiple media, that is capable of storing instructions(e.g., instructions 516) for execution by a machine (e.g., machine 500),such that the instructions, when executed by one or more processors ofthe machine 500 (e.g., processors 510), cause the machine 500 to performany one or more of the methodologies described herein. Accordingly, a“machine-readable medium” refers to a single storage apparatus ordevice, as well as “cloud-based” storage systems or storage networksthat include multiple storage apparatus or devices. The term“machine-readable medium” excludes transitory signals per se.

The I/O components 550 may include a wide variety of components toreceive input, provide output, produce output, transmit information,exchange information, capture measurements, and so on. The specific I/Ocomponents 550 that are included in a particular machine will depend onthe type of machine. For example, portable machines such as mobilephones will likely include a touch input device or other such inputmechanisms, while a headless server machine will likely not include sucha touch input device. It will be appreciated that the I/O components 550may include many other components that are not shown in FIG. 5. The I/Ocomponents 550 are grouped according to functionality merely forsimplifying the following discussion and the grouping is in no waylimiting. In various example embodiments, the I/O components 550 mayinclude output components 552 and input components 554. The outputcomponents 552 may include visual components (e.g., a display such as aplasma display panel (PDP), a light emitting diode (LED) display, aliquid crystal display (LCD), a projector, or a cathode ray tube (CRT)),acoustic components (e.g., speakers), haptic components (e.g., avibratory motor, resistance mechanisms), other signal generators, and soforth. The input components 554 may include alphanumeric inputcomponents (e.g., a keyboard, a touch screen configured to receivealphanumeric input, a photo-optical keyboard, or other alphanumericinput components), point based input components (e.g., a mouse, atouchpad, a trackball, a joystick, a motion sensor, or other pointinginstrument), tactile input components (e.g., a physical button, a touchscreen that provides location or force of touches or touch gestures, orother tactile input components), audio input components (e.g., amicrophone), and the like.

In further example embodiments, the I/O components 550 may includebiometric components 556, motion components 558, environmentalcomponents 560, or position components 562 among a wide array of othercomponents. For example, the biometric components 556 may includecomponents to detect expressions (e.g., hand expressions, facialexpressions, vocal expressions, body gestures, or eye tracking), measurebiosignals (e.g., blood pressure, heart rate, body temperature,perspiration, or brain waves), identify a person (e.g., voiceidentification, retinal identification, facial identification,fingerprint identification, or electroencephalogram basedidentification), and the like. The motion components 558 may includeacceleration sensor components (e.g., accelerometer), gravitation sensorcomponents, rotation sensor components (e.g., gyroscope), and so forth.The environmental components 560 may include, for example, illuminationsensor components (e.g., photometer), temperature sensor components(e.g., one or more thermometer that detect ambient temperature),humidity sensor components, pressure sensor components (e.g.,barometer), acoustic sensor components (e.g., one or more microphonesthat detect background noise), proximity sensor components (e.g.,infrared sensors that detect nearby objects), gas sensors (e.g., gasdetection sensors to detection concentrations of hazardous gases forsafety or to measure pollutants in the atmosphere), or other componentsthat may provide indications, measurements, or signals corresponding toa surrounding physical environment. The position components 562 mayinclude location sensor components (e.g., a Global Position System (GPS)receiver component), altitude sensor components (e.g., altimeters orbarometers that detect air pressure from which altitude may be derived),orientation sensor components (e.g., magnetometers), and the like.

Communication may be implemented using a wide variety of technologies.The I/O components 550 may include communication components 564 operableto couple the machine 500 to a network 580 or devices 570 via coupling582 and coupling 572 respectively. For example, the communicationcomponents 564 may include a network interface component or othersuitable device to interface with the network 580. In further examples,communication components 564 may include wired communication components,wireless communication components, cellular communication components,Near Field Communication (NFC) components, Bluetooth® components (e.g.,Bluetooth® Low Energy), Wi-Fi® components, and other communicationcomponents to provide communication via other modalities. The devices570 may be another machine or any of a wide variety of peripheraldevices (e.g., a peripheral device coupled via a Universal Serial Bus(USB)).

Moreover, the communication components 564 may detect identifiers orinclude components operable to detect identifiers. For example, thecommunication components 564 may include Radio Frequency Identification(RFID) tag reader components, NFC smart tag detection components,optical reader components (e.g., an optical sensor to detectone-dimensional bar codes such as Universal Product Code (UPC) bar code,multi-dimensional bar codes such as Quick Response (QR) code, Azteccode, Data Matrix, Dataglyph, MaxiCode, PDF417, Ultra Code, UCC RSS-2Dbar code, and other optical codes), or acoustic detection components(e.g., microphones to identify tagged audio signals). In addition, avariety of information may be derived via the communication components564, such as, location via Internet Protocol (IP) geo-location, locationvia Wi-Fi® signal triangulation, location via detecting a NFC beaconsignal that may indicate a particular location, and so forth.

In various example embodiments, one or more portions of the network 580may be an ad hoc network, an intranet, an extranet, a virtual privatenetwork (VPN), a local area network (LAN), a wireless LAN (WLAN), a widearea network (WAN), a wireless WAN (WWAN), a metropolitan area network(MAN), the Internet, a portion of the Internet, a portion of the PublicSwitched Telephone Network (PSTN), a plain old telephone service (POTS)network, a cellular telephone network, a wireless network, a Wi-Fi®network, another type of network, or a combination of two or more suchnetworks. For example, the network 580 or a portion of the network 580may include a wireless or cellular network and the coupling 582 may be aCode Division Multiple Access (CDMA) connection, a Global System forMobile communications (GSM) connection, or other type of cellular orwireless coupling. In this example, the coupling 582 may implement anyof a variety of types of data transfer technology, such as SingleCarrier Radio Transmission Technology (1xRTT), Evolution-Data Optimized(EVDO) technology, General Packet Radio Service (GPRS) technology,Enhanced Data rates for GSM Evolution (EDGE) technology, thirdGeneration Partnership Project (3GPP) including 3G, fourth generationwireless (4G) networks, Universal Mobile Telecommunications System(UMTS), High Speed Packet Access (HSPA), Worldwide Interoperability forMicrowave Access (WiMAX), Long Term Evolution (LTE) standard, othersdefined by various standard setting organizations, other long rangeprotocols, or other data transfer technology.

The instructions 516 may be transmitted or received over the network 580using a transmission medium via a network interface device (e.g., anetwork interface component included in the communication components564) and utilizing any one of a number of well-known transfer protocols(e.g., hypertext transfer protocol (HTTP)). Similarly, the instructions516 may be transmitted or received using a transmission medium via thecoupling 572 (e.g., a peer-to-peer coupling) to devices 570. The term“transmission medium” shall be taken to include any intangible mediumthat is capable of storing, encoding, or carrying instructions 516 forexecution by the machine 500, and includes digital or analogcommunications signals or other intangible medium to facilitatecommunication of such software.

Throughout this specification, plural instances may implementcomponents, operations, or structures described as a single instance.Although individual operations of one or more methods are illustratedand described as separate operations, one or more of the individualoperations may be performed concurrently, and nothing requires that theoperations be performed in the order illustrated. Structures andfunctionality presented as separate components in example configurationsmay be implemented as a combined structure or component. Similarly,structures and functionality presented as a single component may beimplemented as separate components. These and other variations,modifications, additions, and improvements fall within the scope of thesubject matter herein.

Although an overview of the inventive subject matter has been describedwith reference to specific example embodiments, various modificationsand changes may be made to these embodiments without departing from thebroader scope of embodiments of the present disclosure. Such embodimentsof the inventive subject matter may be referred to herein, individuallyor collectively, by the term “invention” merely for convenience andwithout intending to voluntarily limit the scope of this application toany single disclosure or inventive concept if more than one is, in fact,disclosed.

The embodiments illustrated herein are described in sufficient detail toenable those skilled in the art to practice the teachings disclosed.Other embodiments may be used and derived therefrom, such thatstructural and logical substitutions and changes may be made withoutdeparting from the scope of this disclosure. The Detailed Description,therefore, is not to be taken in a limiting sense, and the scope ofvarious embodiments is defined only by the appended claims, along withthe full range of equivalents to which such claims are entitled.

As used herein, the term “or” may be construed in either an inclusive orexclusive sense. Moreover, plural instances may be provided forresources, operations, or structures described herein as a singleinstance. Additionally, boundaries between various resources,operations, modules, engines, and data stores are somewhat arbitrary,and particular operations are illustrated in a context of specificillustrative configurations. Other allocations of functionality areenvisioned and may fall within a scope of various embodiments of thepresent disclosure. In general, structures and functionality presentedas separate resources in the example configurations may be implementedas a combined structure or resource. Similarly, structures andfunctionality presented as a single resource may be implemented asseparate resources. These and other variations, modifications,additions, and improvements fall within a scope of embodiments of thepresent disclosure as represented by the appended claims. Thespecification and drawings are, accordingly, to be regarded in anillustrative rather than a restrictive sense.

What is claimed is:
 1. A churn prediction system comprising: at least one hardware processor; an electronic memory storing instructions that when executed configure the at least one hardware processor to perform operations comprising: segmenting a historical sample set of subscriber data from a pool of historical samples; generating, using a programming language for analysis, a plurality of churn predictions for a plurality of target subscribers based at least in part on the segmented historical sample set and target data for the plurality of target subscribers, each churn prediction of the plurality of churn predictions indicating a churn likelihood of a corresponding target subscriber of the plurality of target subscribers; generating a ranked list of the plurality of target subscribers based at least in part on the plurality of churn predictions; and causing presentation of the ranked list via a graphical user interface.
 2. The system of claim 1, wherein the instructions are further executable to cause the at least one hardware processor to perform: causing presentation, via the graphical user interface, of a decision tree comprising a plurality of decision points used for generating the plurality of churn predictions for the plurality of target subscribers.
 3. The system of claim 2, wherein the instructions for causing presentation of the decision tree are further executable to cause the at least one hardware processor to perform causing presentation, via the graphical user interface, of an attribute value of a first target subscriber for a first decision point of the plurality of decision points of the decision tree and a threshold value corresponding to the attribute value.
 4. The system of claim 3, wherein the attribute value of the first target subscriber comprises gross merchandise value information for the first target subscriber, active listing information for the first target subscriber, subscription information for the first target subscriber, or a combination thereof.
 5. The system of claim 2, wherein the instructions are further executable to cause the at least one hardware processor to perform: tracking a target value at each decision point of the plurality of decision points of the decision tree for a first target subscriber of the plurality of target subscribers; and causing presentation of the tracked target values via the graphical user interface.
 6. The system of claim 1, wherein the instructions are further executable to cause the at least one hardware processor to perform: generating a plurality of prediction models, each prediction model generated for a respective segment of the segmented historical sample set.
 7. The system of claim 1, wherein the instructions are further executable to cause the at least one hardware processor to perform: receiving target data for an additional target subscriber, the target data comprising a set of attributes for the additional target subscriber; and generating a churn prediction for the additional target subscriber based at least in part on applying the target data to a decision tree.
 8. The system of claim 1, wherein the instructions for generating the plurality of churn predictions are further executable to cause the at least one hardware processor to perform: generating the plurality of churn predictions based at least in part on applying machine learning to the historical sample set of subscriber data.
 9. The system of claim 1, wherein the instructions for segmenting the historical sample set of subscriber data are further executable to cause the at least one hardware processor to perform: segmenting the historical sample set of subscriber data based at least in part on one or more of account age, subscription age, gross merchandise value (GMV), or any combination thereof.
 10. A computer-implemented method for predicting subscriber churn comprising: segmenting, by at least one hardware processor, a historical sample set of subscriber data from a pool of historical samples; generating, by the at least one hardware processor using a programming language for analysis, a plurality of churn predictions for a plurality of target subscribers based at least in part on the segmented historical sample set and target data for the plurality of target subscribers, each churn prediction of the plurality of churn predictions indicating a churn likelihood of a corresponding target subscriber of the plurality of target subscribers; generating, by the at least one hardware processor, a ranked list of the plurality of target subscribers based at least in part on the plurality of churn predictions; and causing presentation of the ranked list via a graphical user interface.
 11. The method of claim 10, further comprising: causing presentation, via the graphical user interface, of a decision tree comprising a plurality of decision points used for generating the plurality of churn predictions for the plurality of target subscribers.
 12. The method of claim 11, wherein causing presentation of the decision tree comprises: causing presentation, via the graphical user interface, of an attribute value of a first target subscriber for a first decision point of the plurality of decision points of the decision tree and a threshold value corresponding to the attribute value.
 13. The method of claim 12, wherein the attribute value of the first target subscriber comprises gross merchandise value information for the first target subscriber, active listing information for the first target subscriber, subscription information for the first target subscriber, or a combination thereof.
 14. The method of claim 11, further comprising: tracking a target value at each decision point of the plurality of decision points of the decision tree for a first target subscriber of the plurality of target subscribers; and causing presentation of the tracked target values via the graphical user interface.
 15. The method of claim 10, further comprising: generating a plurality of prediction models, each prediction model generated for a respective segment of the segmented historical sample set.
 16. The method of claim 10, further comprising: receiving target data for an additional target subscriber, the target data comprising a set of attributes for the additional target subscriber; and generating a churn prediction for the additional target subscriber based at least in part on applying the target data to a decision tree.
 17. A non-transitory machine-readable medium storing processor-executable instructions which, when executed by at least one hardware processor, cause the at least one hardware processor to perform operations comprising: segmenting a historical sample set of subscriber data from a pool of historical samples; generating, using a programming language for analysis, a plurality of churn predictions for a plurality of target subscribers based at least in part on the segmented historical sample set and target data for the plurality of target subscribers, each churn prediction of the plurality of churn predictions indicating a churn likelihood of a corresponding target subscriber of the plurality of target subscribers; generating a ranked list of the plurality of target subscribers based at least in part on the plurality of churn predictions; and causing presentation of the ranked list via a graphical user interface.
 18. The non-transitory machine-readable storage medium of claim 17, wherein the instructions are further executable to cause the at least one hardware processor to perform: causing presentation, via the graphical user interface, of a decision tree comprising a plurality of decision points used for generating the plurality of churn predictions for the plurality of target subscribers.
 19. The non-transitory machine-readable storage medium of claim 18, wherein the instructions for causing presentation of the decision tree are further executable to cause the at least one hardware processor to perform: causing presentation, via the graphical user interface, of an attribute value of a first target subscriber for a first decision point of the plurality of decision points of the decision tree and a threshold value corresponding to the attribute value.
 20. The non-transitory machine-readable storage medium of claim 17, wherein the instructions are further executable to cause the at least one hardware processor to perform: generating a plurality of prediction models, each prediction model generated for a respective segment of the segmented historical sample set. 